Idiomatic MWEs and Machine Translation A Retrieval and Representation Model: the AraMWE Project
نویسندگان
چکیده
A preliminary implementation of AraMWE, a hybrid project that includes a statistical component and a CCG symbolic component to extract and treat MWEs and idioms in Arabic and English parallel texts is presented, together with a general sketch of the system, a thorough description of the statistical component and a proof of concept of the CCG component.
منابع مشابه
A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations
Verb–noun idiomatic combinations (VNICs) are idioms consisting of a verb with a noun in its direct object position. Usages of these expressions can be ambiguous between an idiomatic usage and a literal combination. In this paper we propose supervised and unsupervised approaches, based on word embeddings, to identifying token instances of VNICs. Our proposed supervised and unsupervised approache...
متن کاملDictionary of Multiword Expressions for Translation into highly Inflected Languages
Treatment of Multiword Expressions (MWEs) is one of the most complicated issues in natural language processing, especially in Machine Translation (MT). The paper presents dictionary of MWEs for a English-Latvian MT system, demonstrating a way how MWEs could be handled for inflected languages with rich morphology and rather free word order. The proposed dictionary of MWEs consists of two constit...
متن کاملAcl - Ijcnlp 2009 Mwe 2009 2009
Order copies of this and other ACL proceedings from: The workshop focused on Multi-Word Expressions (MWEs), which represent an indispensable part of natural languages and appear steadily on a daily basis, both novel and already existing but paraphrased, which makes them important for many natural language applications. Unfortunately, while easily mastered by native speakers, MWEs are often non-...
متن کاملBeyond Words: Deep Learning for Multiword Expressions and Collocations
Deep learning has recently shown much promise for NLP applications. Traditionally, in most NLP approaches, documents or sentences are represented by a sparse bag-of-words representation. There is now a lot of work which goes beyond this by adopting a distributed representation of words, by constructing a so-called “neural embedding” or vector space representation of each word or document. The a...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کامل